|
Expressing Program Analyses in Datalog
PDF Version AbstractDeclarative rule-based languages like Datalog have been shown to be suitable for writing program analyses. We show how implementing Datalog using a data structure called Binary Decision Diagrams enables the computation of precise pointer aliases in real life applications. We then demonstrate how pointer analysis enables important program analyses that build upon these results.IntroductionPointer alias analysis was a long-standing problem in the literature of program analysis [2]. Two aspects of the problem are particularly troublesome:
There is, however, a way out of this. Points-to information is often highly repetitive, and as a result, it can be dramatically compressed. A data structure called Binary Decision Diagrams, or BDDs, represents repetitive data very well, and can produce an exponential reduction in the amount of space required to represent pointer results. BDD-based pointer results are compact enough that a pointer analysis can work out all pointer information a priori and encode its results for a subsequent analysis to use. BDDs are not easy to work with directly. We can hide the details of BDD manipulation by allowing analyses to be written in the logic programming language Datalog and developing a compiler that translates Datalog into optimized BDD code.. In this article we will introduce BDDs, show their mapping to Datalog, and present a concise, precise pointer analysis. We then discuss how to build upon the results of the pointer analysis to build new special-purpose analyses. Binary Decision DiagramsBinary Decision Diagrams are a data structure for compactly representing sets of bit strings. The bit strings in a set are represented as paths through a diagram, stopping when a decision has been reached. Each internal node in the diagram represents a decision of the path taken according to the value of the bit. The leaf node is either 0 or 1 signifying whether the bit pattern is an element of the set. Common bit patterns among members of a set translate to a compact representation.An example is shown in Figure 1. Both diagrams encode the set {110, 111}. The left diagram is "complete" - each bit leads to a new node, and once all bits are processed, a conclusion is reached. The right diagram simplifies this to only the necessary nodes, which compresses the representation to 4 nodes instead of 15. BDDs and DatalogDatalog's structure maps well to the concept of BDDs:
Pointer Analysis with BDDsFor pointer analysis, a program can be abstracted into just a few relations:
variable v may point to heap location h. Likewise, fieldPoints(h1, f, h2) represents that for an object h1, the field h1.f may point to h2. varPoints(v,h) :- varInit(v, h). fieldPoints(h1, f, h2) :- store(v1, f, v2), varPoints(v1, h1), varPoints(v2, h2) :- load(v1, f, v2), varPoints(v1, h1),In object-oriented systems such as Java, pointer information also influences the call graph. Whaley and Lam extend this analysis further to produce an invokes relation computed alongside the points-to sets [4]. The varPoints and invokes relations combined are sufficient to implement a wide range of additional analyses. Program AnalysesOne particularly compelling use of pointer analysis is ot automatically detect errors and security vulnerabilities in programs. Armed with accurate pointer information, many rules governing the correctness or security of programs can be readily expressed in Datalog.As an example, consider the class of problems known as taint analysis. This class includes SQL injection and Cross-site scripting, two of the most common vulnerabilities in Web applications. In these problems, we seek to discover whether user input can flow to some sensitive place such as a system command (letting the user run any program on the server with the privileges of the service) or a database request (letting the user bypass security or alter data). A simple analysis for this is expressible in two lines: badInput(h) :- returns(i, v), varPointsTo(v, h), error(h) :- badInput(h), argument(i, v), varPointsTo(v, h),This looks for calls to getInput and system where the return value of the first is the same as the argument to the second. This is somewhat naive, but this illustrates how analyses can be written in Datalog. Results of analyses can easily be used in more complex analyses. ConclusionProgram analysis is largely a matter of iterative, systematic deduction based on facts about the target program. As such, rule-based systems such as Datalog can represent the core of an analysis easily. Furthermore, binary decision diagrams (BDDs) permit an efficient implementation of many analyses across sizable programs.References[1] M. Berndl, O. Lhotk, F. Qian, L. Hendren, and N. Umanee. Points-to analysis using BDDs. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pages 103-114, 2003.[2] Michael Hind. Pointer analysis: Haven't we solved this problem yet? In Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE'01), pages 54-61, 2001. [3] John Whaley et al. The bddbddb analysis system. http://bddbddb.sourceforge.net/. [4] John Whaley and Monica S. Lam. Cloning-based context-sensitive pointer alias analysis using Binary Decision Diagrams. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI), pages 131-144, 2004. |